Conducting Vessel Data Imputation Method Selection Based on Dataset Characteristics

نویسندگان

چکیده

Abstract Time series datasets collected from marine sensors inevitably undergo missing data problems. This cause unreliable sensor to assist the decision-making process. Many methods are offered impute values. However, selecting best imputation method is not a trivial task, as it usually requires domain expertise and several trial-and-error iterations. Furthermore, when imputations carried out in careless way, generates high error factor that can lead stakeholders wrong assumptions. paper provides systematic approach able extract characteristics of underlying and, based on it, recommends less error-prone method. We evaluate our proposed using nine real-world vessel datasets. In total, we generated 3859 samples consisting 17 inputs 1 target feature. Experimental results show capable obtaining weighted F1-Score 92.6%. Additionally, compared with application selected methods, work gain up 86% average score, worst case being 5%. empirically demonstrate efficient methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine Learning Based Missing Value Imputation Method for Clinical Dataset

Missing value imputation is one of the biggest tasks of data pre-processing when performing data mining. Most medical datasets are usually incomplete. Simply removing the cases from the original datasets can bring more problems than solutions. A suitable method for missing value imputation can help to produce good quality datasets for better analysing clinical trials. In this paper we explore t...

متن کامل

Traffic Speed Data Imputation Method Based on Tensor Completion

Traffic speed data plays a key role in Intelligent Transportation Systems (ITS); however, missing traffic data would affect the performance of ITS as well as Advanced Traveler Information Systems (ATIS). In this paper, we handle this issue by a novel tensor-based imputation approach. Specifically, tensor pattern is adopted for modeling traffic speed data and then High accurate Low Rank Tensor C...

متن کامل

Missing Value Imputation Based on Data Clustering

We propose an efficient nonparametric missing value imputation method based on clustering, called CMI (Clustering-based Missing value Imputation), for dealing with missing values in target attributes. In our approach, we impute the missing values of an instance A with plausible values that are generated from the data in the instances which do not contain missing values and are most similar to t...

متن کامل

Robust Tree-Based Incremental Imputation Method for Data Fusion

Data Fusion and Data Grafting are concerned with combining files and information coming from different sources. The problem is not to extract data from a single database, but to merge information collected from different sample surveys. The typical data fusion situation formed of two data samples, the former made up of a complete data matrix X relative to a first survey, and the latter Y which ...

متن کامل

Tumor Gene Characteristics Selection Method Based on Multi-Agent

For the tumor gene expression profile data that aiming to high-dimension small samples, how to select the classification feature of samples among thousands genes effectively is the difficult problems for analysis on tumor gene expression profile. First to partition the data set into K average divisions, to use Lasso method performing feature selection on each respectively, and then merge each s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IOP conference series

سال: 2023

ISSN: ['1757-899X', '1757-8981']

DOI: https://doi.org/10.1088/1755-1315/1198/1/012017